July 9-12 2019



Lets start

Welcome

Day 1: Management of clinical trial data

-Introduction (Arsenio)
-Data management plan (Arsenio)
-Data entry (Arsenio)
-Data traceability (Arsenio)
-Practical session (Arsenio and Joe)

Day 2:

-Managing data quality and validation (Joe)
-Data coding (Joe)
-Data sharing (Joe)
-Practical session (Joe and Arsenio)

Day 3:

Part 1

-Study registration (Joe)
-Information in the protocol (Joe)
-Statistics and and statistical plan (SAP) (Joe)
-Interim analysis (Joe)
-Publication (Joe)
-Data processing (Joe)
-Data cleaning (Joe)

Day 3:

Part 2

  • Statistical analysis (descriptive, exploratory, confirmatory) (Arsenio)
  • Communication (Arsenio)
  • Practical session (Arsenio)

Getting started in R

5 reasons to use R

Reason 1: It’s free

5 reasons to use R

Reason 2: It’s “open source”

mxcursos.com

5 reasons to use R

Reason 3: It’s beautiful

https://www.r-bloggers.com/a-map-of-the-world-by-tweets/

5 reasons to use R

Reason 3: It’s beautiful

http://www.isric.org/sites/default/files/image01.png

5 reasons to use R

Reason 3: It’s beautiful

https://ryouready.wordpress.com/2015/04/14/beautiful-plots-while-simulating-loss-in-two-part-procrustes-problem/

5 reasons to use R

Reason 3: It’s beautiful

http://asbcllc.com/visualizations/weather/gotham_2014/plot.svg

5 reasons to use R

Reason 4: It’s powerful

https://medium.com/@itamargilad/analyze-your-data-like-a-pro-with-r-e5e89a64564a#.thq1nkp65

5 reasons to use R

Reason 5: It’s fun

http://www.hlgjyl888.com/group/fun-pictures/



Day one

Day 1: Management of clinical trial data

-Introduction (Arsenio)
-Data management plan (Arsenio)
-Data entry (Arsenio)
-Data traceability (Arsenio)
-Practical session (Arsenio and Joe)

Practical session

Let’s generate the following together:

-A research hypothesis
-A basic DMP
-A basic EDC (electronic data capture) entry system (google sheets)
-A scripted audit / log system (R)

A research hypothesis

This should be a falsifiable, generalized statement about the people in this room.

Example: the older people are, the more they like dancing.

A basic data management plan

This should be a list of bullet points (5-15), including

  • What is the work to be performed?
  • What guidelines will apply to data collection?
  • The specific questionnaire
  • Edit guidelines
  • Roles
  • Circumstances under which plan would be revised
  • The software used for collection and analysis
  • Study database lock

A basic EDC

  • ODK
  • OpenClinica
  • OpenHDS
  • Custom solutions
  • Google sheets

A scripted audit/log system

(If time permits, we’ll bulid this together later in the week)



Day one set up

Installation and set-up

Getting familiar with RStudio

First code

Let’s write some code!

2 + 2

First code

Let’s write some code!

2 + 2
[1] 4

First code

Let’s write some code!

x <- c(1,2,3,4,5)

First code

Let’s write some code!

x
[1] 1 2 3 4 5

First code

Let’s write some code!

barplot(x)

Packages

http://www.rgbstock.com/bigphoto/mB1JGWC/Box

Packages

A “package” is simply a collection of code written by someone else.

It’s what makes R powerful, but also confusing.

Installing packages

You only have to install a package one time.

install.packages('tidyverse')

Using packages

You have to use the library function every time you use a package.

library(tidyverse)

Writing library just means “I am going to use this package”.

Activity: let’s get some packages

Install the following packages:

tidyverse
maptools
RColorBrewer
ggthemes
knitr
leaflet
raster
rgdal
rgeos
rmarkdown
sp
tidyr
tidyverse
gsheet

Creating objects

a <- 1
a + 3

Creating objects

Let’s create an object called “ages”, with the age of everyone

ages <- c()

Exploring objects

How do we view our ages object?

ages

Exploring objects

How do we view our ages object?

ages
 [1] 30 26 31 39 45 27 28 22 19 30 35

Exploring objects

How do we view just the first element of our ages object?

ages[1]

Exploring objects

How do we view just the first element of our ages object?

ages[1]
[1] 30

Exploring objects

How do we sort our ages object?

sorted_ages <- sort(ages)
sorted_ages
 [1] 19 22 26 27 28 30 30 31 35 39 45

Exploring objects

How do we get the minimum, maximum, average age?

min(ages)
max(ages)
mean(ages)

Exploring objects

min(ages)
[1] 19
max(ages)
[1] 45
mean(ages)
[1] 30.18182

Visualizing objects

How do we visualize our ages object?

hist(ages)

Multi-dimensional objects

Previously, we looked at a one dimensional object: ages.

But most data is two dimensional: rows and columns.

This is called a data frame.

Let’s play around with some real data.

Multi-dimensional objects

Let’s create a simple dataframe

www.databrew.cc/frangos.csv

frangos <- databrew::frangos

Frangos

head(frangos)
# A tibble: 6 x 4
  diet  chick  days grams
  <chr> <int> <dbl> <int>
1 corn      1 0.192    42
2 corn      1 1.01     51
3 corn      1 4.52     59
4 corn      1 6.72     64
5 corn      1 8.14     76
6 corn      1 9.11     93

Frangos

Let’s explore.

Brackets: []

Let’s filter

Let’s visualize



Day two

Day 2: Managing data quality and validation

-Data coding
-Data sharing
-Practical session

Data “coding”

-“Coding” is the act of assigning a (usually numeric) value to a categorical concept.
-Example: Female = 1, Male = 2, Other = 3, Unknown = 4 -Example: Aged 0-5 = 1, Aged 6-18 = 2, Aged 19-45 = 3, Aged 46+ = 4, Unknown = 98

Advantages of coding

-Saves physical space on paper CRFs
-Saves significant time in paper-to-digital data entry
-Forces categorization (not necessarily good)
-Forces a priori thinking about meaningful categorization
-Saves hard-drive space

Do you need to “code”?

-Lots of data capture is now digital
-Categorization is not necesarilly good
-Hard-drive space is rarely a limiting issue -Coding means one more layer between the data and understanding it

If you choose to “code”

-You should have comprehensive data dictionaries: both machine- and human-readable
-Your “levels” should make ordenal/notional sense
-Your categories/codes should be tested prior to deployment

Why a “machine-readable” codebook?

-Automated joinds vs. manual recoding

Data sharing

kinds of data to share (and not)

-Identifying vs non-identifying information -Health vs non-health data -Raw vs processed
-Individual vs aggregated

Practical session

Data aggregation

Ie, turning individual-level data into group-level data

Data anonimization

Ie, making individual-level identifiable data non-identifiable

Practical session: data aggregation

Getting data

www.databrew.cc/frangos.csv



Day three

Day 3: Part 1

Study registration

Need to fill out

Information on the protocol

Need to fill out

SAP

Need to fill out

Interim analysis

Need to fill out

Publication

Need to fill out

Data processing

Need to fill out

Data cleaning

Need to fill out

Day 3: part 2

Statistical analysis (descriptive, exploratory, confirmatory)

Need to fill out

Communication

Need to fill out

Practical session

Need to fill out



08

Getting data

We’re going to use the cism package to get weather data for the FQMA weather station (Maputo).

library(cism)
Error in library(cism): there is no package called 'cism'
??get_weather

Getting data

weather <- get_weather(station = 'FQMA', 
                       start_year = 2010,
                       end_year = 2016)
Error in get_weather(station = "FQMA", start_year = 2010, end_year = 2016): could not find function "get_weather"

Exploring data

Now that we have our weather data, we can look at it.

head(weather)

Exploring data

Now that we have our weather data, we can look at it.

head(weather)
Error in head(weather): object 'weather' not found

Some questions on our data

  1. How many rows are in our data?
  2. How many columns?
  3. What are the names of the columns?

Some questions on our data

# 1. How many rows are in our data?
nrow(weather)
# 2. How many columns?
ncol(weather)
# 3. What are the names of the columns?
colnames(weather)

Some questions on our data

# 1. How many rows are in our data?
nrow(weather)
Error in nrow(weather): object 'weather' not found

Some questions on our data

# 2. How many columns?
ncol(weather)
Error in ncol(weather): object 'weather' not found

Some questions on our data

# 3. What are the names of the columns?
colnames(weather)
Error in is.data.frame(x): object 'weather' not found

Questions about specific columns

  1. What is the date range?
  2. What is the maximum temperature?
  3. What is the minimum temperature?
  4. What is the average temperature?

Questions about specific columns

# 4. What is the date range?
range(weather$date)
# 5. What is the maximum temperature?
max(weather$temp_max)
# 6. What is the minimum temperature?
min(weather$temp_min)
# 7. What is the average temperature?
mean(weather$temp_mean)

Questions about specific columns

# 4. What is the date range?
range(weather$date)
Error in eval(expr, envir, enclos): object 'weather' not found

Questions about specific columns

# 5. What is the maximum temperature?
max(weather$temp_max, na.rm = TRUE)
Error in eval(expr, envir, enclos): object 'weather' not found

Questions about specific columns

# 6. What is the minimum temperature?
min(weather$temp_min, na.rm = TRUE)
Error in eval(expr, envir, enclos): object 'weather' not found

Questions about specific columns

# 7. What is the average temperature?
mean(weather$temp_mean, na.rm = TRUE)
Error in mean(weather$temp_mean, na.rm = TRUE): object 'weather' not found

Visualizing our data

Which variables do we have which are numeric and continuous?

How can we visualize these?

Visualizing our data

Which variables do we have which are numeric and continuous?

  • temp_max, temp_mean, temp_min, etc…

How can we visualize these?

  • boxplot, histogram

Boxplot

boxplot(weather$temp_mean)
Error in boxplot(weather$temp_mean): object 'weather' not found

Histogram

hist(weather$temp_mean)
Error in hist(weather$temp_mean): object 'weather' not found

Creating new variables

Let’s create a variable called “hot”

Creating new variables

weather$hot <- ifelse(weather$temp_max > 30, 'hot', 'not hot')
Error in ifelse(weather$temp_max > 30, "hot", "not hot"): object 'weather' not found

Creating new variables

head(weather)

Creating new variables

head(weather)
Error in head(weather): object 'weather' not found

Exploring our new variable

table(weather$hot)
hot_table <- table(weather$hot)
hot_prop_table <- prop.table(hot_table)

Exploring our new variable

hot_table <- table(weather$hot)
Error in table(weather$hot): object 'weather' not found
hot_prop_table <- prop.table(hot_table)
Error in prop.table(hot_table): object 'hot_table' not found
barplot(hot_table)
Error in barplot(hot_table): object 'hot_table' not found

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo')
Error in barplot(hot_table, main = "Hot days in Maputo"): object 'hot_table' not found

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days')
Error in barplot(hot_table, main = "Hot days in Maputo", ylab = "Number of days"): object 'hot_table' not found

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature')
Error in barplot(hot_table, main = "Hot days in Maputo", ylab = "Number of days", : object 'hot_table' not found

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature',
        col = c('red', 'blue'))
Error in barplot(hot_table, main = "Hot days in Maputo", ylab = "Number of days", : object 'hot_table' not found

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature',
        col = c('red', 'blue'),
        border = 'darkgrey')
Error in barplot(hot_table, main = "Hot days in Maputo", ylab = "Number of days", : object 'hot_table' not found

Multi-variable plots

Let’s create a plot of date (x-axis) and the maximum temperature

Multi-variable plots

Let’s create a plot of date (x-axis) and the maximum temperature

plot(weather$date,
     weather$temp_max)
Error in plot(weather$date, weather$temp_max): object 'weather' not found

Multi-variable plots

Let’s make our plot prettier

Multi-variable plots

Let’s make our plot prettier

plot(weather$date,
     weather$temp_max,
     type = 'l',
     col = 'red',
     xlab = 'Date',
     ylab = 'Maximum temperature',
     main = 'Maximim temperature in Maputo')
Error in plot(weather$date, weather$temp_max, type = "l", col = "red", : object 'weather' not found



09

Getting data

We’re going to use the cism package to get weather data for the FQMA weather station (Maputo).

library(cism)
Error in library(cism): there is no package called 'cism'
??get_weather

Getting data

weather <- get_weather(station = 'FQMA', 
                       start_year = 2010,
                       end_year = 2016)
Error in get_weather(station = "FQMA", start_year = 2010, end_year = 2016): could not find function "get_weather"

Exploring data

Now that we have our weather data, we can look at it.

head(weather)

Exploring data

Now that we have our weather data, we can look at it.

head(weather)
Error in head(weather): object 'weather' not found

Some questions on our data

  1. How many rows are in our data?
  2. How many columns?
  3. What are the names of the columns?

Some questions on our data

# 1. How many rows are in our data?
nrow(weather)
# 2. How many columns?
ncol(weather)
# 3. What are the names of the columns?
colnames(weather)

Some questions on our data

# 1. How many rows are in our data?
nrow(weather)
Error in nrow(weather): object 'weather' not found

Some questions on our data

# 2. How many columns?
ncol(weather)
Error in ncol(weather): object 'weather' not found

Some questions on our data

# 3. What are the names of the columns?
colnames(weather)
Error in is.data.frame(x): object 'weather' not found

Questions about specific columns

  1. What is the date range?
  2. What is the maximum temperature?
  3. What is the minimum temperature?
  4. What is the average temperature?

Questions about specific columns

# 4. What is the date range?
range(weather$date)
# 5. What is the maximum temperature?
max(weather$temp_max)
# 6. What is the minimum temperature?
min(weather$temp_min)
# 7. What is the average temperature?
mean(weather$temp_mean)

Questions about specific columns

# 4. What is the date range?
range(weather$date)
Error in eval(expr, envir, enclos): object 'weather' not found

Questions about specific columns

# 5. What is the maximum temperature?
max(weather$temp_max, na.rm = TRUE)
Error in eval(expr, envir, enclos): object 'weather' not found

Questions about specific columns

# 6. What is the minimum temperature?
min(weather$temp_min, na.rm = TRUE)
Error in eval(expr, envir, enclos): object 'weather' not found

Questions about specific columns

# 7. What is the average temperature?
mean(weather$temp_mean, na.rm = TRUE)
Error in mean(weather$temp_mean, na.rm = TRUE): object 'weather' not found

Visualizing our data

Which variables do we have which are numeric and continuous?

How can we visualize these?

Visualizing our data

Which variables do we have which are numeric and continuous?

  • temp_max, temp_mean, temp_min, etc…

How can we visualize these?

  • boxplot, histogram

Boxplot

boxplot(weather$temp_mean)
Error in boxplot(weather$temp_mean): object 'weather' not found

Histogram

hist(weather$temp_mean)
Error in hist(weather$temp_mean): object 'weather' not found

Creating new variables

Let’s create a variable called “hot”

Creating new variables

weather$hot <- ifelse(weather$temp_max > 30, 'hot', 'not hot')
Error in ifelse(weather$temp_max > 30, "hot", "not hot"): object 'weather' not found

Creating new variables

head(weather)

Creating new variables

head(weather)
Error in head(weather): object 'weather' not found

Exploring our new variable

table(weather$hot)
hot_table <- table(weather$hot)
hot_prop_table <- prop.table(hot_table)

Exploring our new variable

hot_table <- table(weather$hot)
Error in table(weather$hot): object 'weather' not found
hot_prop_table <- prop.table(hot_table)
Error in prop.table(hot_table): object 'hot_table' not found
barplot(hot_table)
Error in barplot(hot_table): object 'hot_table' not found

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo')
Error in barplot(hot_table, main = "Hot days in Maputo"): object 'hot_table' not found

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days')
Error in barplot(hot_table, main = "Hot days in Maputo", ylab = "Number of days"): object 'hot_table' not found

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature')
Error in barplot(hot_table, main = "Hot days in Maputo", ylab = "Number of days", : object 'hot_table' not found

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature',
        col = c('red', 'blue'))
Error in barplot(hot_table, main = "Hot days in Maputo", ylab = "Number of days", : object 'hot_table' not found

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature',
        col = c('red', 'blue'),
        border = 'darkgrey')
Error in barplot(hot_table, main = "Hot days in Maputo", ylab = "Number of days", : object 'hot_table' not found

Multi-variable plots

Let’s create a plot of date (x-axis) and the maximum temperature

Multi-variable plots

Let’s create a plot of date (x-axis) and the maximum temperature

plot(weather$date,
     weather$temp_max)
Error in plot(weather$date, weather$temp_max): object 'weather' not found

Multi-variable plots

Let’s make our plot prettier

Multi-variable plots

Let’s make our plot prettier

plot(weather$date,
     weather$temp_max,
     type = 'l',
     col = 'red',
     xlab = 'Date',
     ylab = 'Maximum temperature',
     main = 'Maximim temperature in Maputo')
Error in plot(weather$date, weather$temp_max, type = "l", col = "red", : object 'weather' not found